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ABSTRACT 

Diffusion of information, spread of rumors and infectious diseases 
are all instances of stochastic processes that occur over the edges 
of an underlying network. Many times networks over which con- 
tagions spread are unobserved, and such networks are often dy- 
namic and change over time. In this paper, we investigate the 
problem of inferring dynamic networks based on information dif- 
fusion data. We assume there is an unobserved dynamic network 
that changes over time, while we observe the results of a dynamic 
process spreading over the edges of the network. The task then is 
to infer the edges and the dynamics of the underlying network. 

We develop an on-line algorithm that relies on stochastic con- 
vex optimization to efficiently solve the dynamic network inference 
problem. We apply our algorithm to information diffusion among 
3.3 million mainstream media and blog sites and experiment with 
more than 179 million different pieces of information spreading 
over the network in a one year period. We study the evolution of 
information pathways in the online media space and find interest- 
ing insights. Information pathways for general recurrent topics are 
more stable across time than for on-going news events. Clusters 
of news media sites and blogs often emerge and vanish in matter of 
days for on-going news events. Major social movements and events 
involving civil population, such as the Libyan's civil war or Syria's 
uprise, lead to an increased amount of information pathways among 
blogs as well as in the overall increase in the network centrality of 
blogs and social media sites. 

Categories and Subject Descriptors: H.2.8 [Database Manage- 
ment]: Database applications — Data mining 
General Terms: Algorithms; Experimentation. 
Keywords: Networks of diffusion. Information cascades, Blogs, 
News media, Meme-tracking, Social networks. 

1. INTRODUCTION 

Networks represent a fundamental medium for spreading and 
diffusion of various types of behavior, information, rumors and dis- 
eases [27]. A contagion appears at some node of a network and 
then spreads like an epidemic from node to node over the edges of 
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the underlying network. For example, in case of information diffu- 
sion, the contagion represents a piece of information 1 16, 18] and 
infection events correspond to times when nodes mention or copy 
the information from one of their neighbors in the network. Sim- 
ilarly, we can think about the spread of a new type of behavior or 
an action, e.g., purchasing a new cellphone |15|, or the propaga- 
tion of a contagious disease over the edges of the underlying social 
network |6]. 

In the context of network diffusion, we often observe the tem- 
poral traces of diffusion while the pathways over which contagion 
spreads remain hidden. In other words, we observe the times when 
each node gets infected by the contagion, but the edges of the net- 
work that gave rise to the diffusion remain unobservable. For ex- 
ample, we can often measure and observe the time when people 
decide to adopt a new behavior while we do not explicitly observe 
which neighbor in the social network influenced them to do so. In 
case of information diffusion, we often observe people (or media 
sites) talking about a new piece of information without explicitly 
observing the path it took in the information diffusion network to 
reach the particular node of interest. And, epidemiologists often 
observe when a person gets sick but usually cannot tell who in- 
fected her. In all these examples, one can observe the infection 
events themselves while not knowing over which edges of the net- 
work the contagions spread. Therefore, one of the fundamental 
research problems in the context of network diffusion is inferring 
the structure of networks over which various types of contagions 
spread fTO) . Moreover, many times networks over which conta- 
gions diffuse are not static but change over time. Depending on 
the type of contagion, the time of the day, or death of the existing 
and birth of new nodes, the underlying network may dynamically 
change and shift over time. 

In recent years, several network inference algorithms have been 
developed (9l |10|[T2l 20]^ 24|[30|. Some approaches infer only the 
network structure (10 ]30 , while others infer not only the network 
structure but also the strength or the average latency of every edge 
in the network ^ |20| . However, to the best of our knowledge, 
previous work has always assumed networks to be static and con- 
tagion pathways to be constant over time. However, in most cases, 
networks are dynamic, and contagion pathways change over time, 
depending upon the contagions that propagate through them |22[ 
|28 |. For example, a blog can increase its popularity abruptly after 
one of its posts turns viral, this may create new edges in the infor- 
mation transmission network and so the content the blog produces 
in the future will likely spread to larger parts of the network. Sim- 
ilarly, at any given time a particular unexpected event may occur 
and a topic or piece of news may become very popular for a limited 
period of time. This again will lead to different emerging and van- 
ishing information pathways, and thus to a time- varying underlying 
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Table 1: Various models of edge transmission likelihood. 

network. In order to better understand these temporal changes, one 
needs to reconstruct the time-varying structure and underlying tem- 
poral dynamics of these networks and then study the information 
pathways of real-world events, topics or content. 

Our approach to time-varying network inference. In this paper 
we investigate the problem of inferring dynamic networks based on 
information diffusion data. We assume there is an unobserved dy- 
namic network that changes over time, while we observe the node 
infection times of many different contagions spreading over the 
edges of the network. The task then is to infer the edges and the dy- 
namics of the underlying network. We develop an efficient on-line 
dynamic network inference algorithm, InfoPath, that allows us to 
infer daily networks of information diffusion between online media 
sites over a one year period using more than 179 million different 
contagions diffusing over the underlying media network. 

We model diffusion processes as discrete networks of fully con- 
tinuous temporal processes occurring at different rates building on 
our previous work (9l|ll|. Our model allows information to prop- 
agate at different rates across different edges by adopting a data- 
driven approach, where only the recorded temporal diffusion events 
are used. The model considers the information which propagates 
through the network due only to diffusion, while ignoring any ex- 
ternal sources [22^ . However, our original diffusion model con- 
sidered only static networks j9j. Here, we generalize the model 
and develop a new inference method to support dynamic networks. 
Our time-varying network inference algorithm, InfoPath, uses 
stochastic gradient |26| to provide estimates of the time-varying 
structure and temporal dynamics of the inferred network. The frame- 
work enables us to study the temporal evolution of information 
pathways in the online media space. 

We apply the InfoPath algorithm to synthetic as well as real 
Web information propagation data. We study 179 million different 
information cascades spreading among 3.3 million blog and news 
media sites over a one year period, from March 2011 till Febru- 
ary 20120 Results on synthetic data show InfoPath is able to 
track changes in the topology of dynamic networks and provides 
accurate on-line estimates of the time- varying transmission rates of 
the edges of the network. InfoPath is also robust across network 
topologies, and temporal trends of edge transmission dynamics. 

Experiments on large-scale real news and social media data lead 
to interesting insights and findings. For example, we find that the 
information pathways over which general recurrent topics propa- 
gate remain more stable over time, while unexpected events lead 
to dramatically changing information pathways. Clusters of main- 
stream news and blogs often emerge and vanish in a matter of days, 
and our on-line algorithm is able to uncover such structures. News 
events that involve large-scale social movements, as the Libyan 
civil war, Egypt's revolution or Syria's uprise, result in a greater 
increase in information transfer among blogs than among main- 
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stream media. Perhaps surprisingly, the amount of mainstream me- 
dia and blogs among the most influential nodes for most topics or 
news events are comparable. However, we find that growing num- 
bers of influential blogs on some topics or news events are often 
temporally correlated with large-scale social movements (e.g., the 
Occupy Wall Street movement in Sept-Nov 201 1). 

Further related work. Previous methods for inferring diffusion 
networks [9| [TO] p , 20] also use a generative probabilistic model 
for modeling cascading processes over networks. NetInf [10] and 
MultiTree 1 12] infer the network connectivity using submodu- 
lar optimization. NetRate (9) and ConNIe (20) infer not only 
the network connectivity but also transmission rates of infection or 
prior probabilities of infection using convex optimization. More- 
over, there have been also attempts to model information diffusion 
without assuming the existence of an underlying network |33II32|. 

However, to the best of our knowledge, all previous approaches 
to network inference assume the network and the underlying dy- 
namics of the edges to be constant, i.e., the network structure and 
the transmission rates of each edge do not change over time. There- 
fore, they consider the pathways over which information propa- 
gates to be time-invariant. The main contribution of this paper is 
to combine stochastic gradient and the diffusion model introduced 
in j9) to develop an efficient on-line network inference algorithm 
that provides time- varying estimates of the edges of a network and 
the transmission rates of each edge. This allows us to detect how 
information pathways emerge and vanish over time, and identify 
when nodes produce highly viral content. 

The remainder of the paper is organized as follows: in Sec. [2] 
we revisit the model of diffusion and state the dynamic network 
inference problem. Section [3] describes the proposed time- varying 
network inference method, called InfoPath. Section |4] evaluates 
InfoPath quantitatively and qualitatively using synthetic and real 
diffusion data. We conclude with a discussion of results in Sec- 
tionlH 



2. PROBLEM FORMULATION 

In this section, we build on our fully continuous time model 
of diffusion |'9', '1 1|. We start by briefly describing the generative 
model for the observed data. We then revisit how to compute the 
likelihood of a cascade using the model and state the continuous 
time network inference problem for both static and dynamic net- 
works. Across the section, we explicitly point out which assump- 
tions of the original model need to be extended in order to support 
dynamic networks. 

Observed data. For now let's consider a single static directed net- 
work. Over the edges of the network multiple contagions propa- 
gate. As the contagion spreads from infected to non-infected nodes 
over the edges of the network the contagion creates a cascade. 
For each contagion c, we observe a cascade f^, which is simply 
a record of observed node infection times during a time window 
of length r''. In an information propagation setting, each cascade 
corresponds to a different piece of information and the infection 
time of a node is simply the time when the node first mentioned the 
piece of information c. 

Cascade is a A'-dimensional vector := (?[,..., f^) recording 
the times when each of A' nodes got infected by the contagion c: 

G [to, to + T'^] U {°°}, where to is the infection time of the first 
node. Generally contagions do not infect all the nodes of the net- 
work, symbol oo is used for nodes that were not infected by the 
contagion c during the observation window [tQ,to + T"^]. Conta- 
gions often propagate simultaneously [21, ,25J over the same net- 



Model survival function Hazard function Cascade gradient for uninfected Cascade gradient for infected 

logS(ti\tj;aj,i) H(t,\tj-aj,,) V„^X,(A) Va,X,(A) 

EXP -aj,,{t,-tj) Uj, T-f. (f^-_,p_^_J_ 

-«..log(^) ^^UJ^ '°g(y) 

RAY -a,.^ a,r(t.-t,) ^ 



Table 2: Contagion transmission models for the three edge transmission likelihoods: Exponential, Power-law and Rayleigh. 



work but we assume each contagion to propagate independently of 
each other. 

Given a set of node infection times of many different contagions, 
our goal is to infer the underlying dynamic network over which 
contagions propagated. We apply the Maximum Likelihood princi- 
ple in order to infer the network that most likely generated the ob- 
served data. We proceed by assuming a static network and describe 
the generative model of information diffusion. We then generalize 
the model to dynamic networks. 

Pairwise transmission likelihood. The first step in modeling dif- 
fusion dynamics is to consider pairwise node interaction. For eve- 
ry pair of nodes we define a pairwise transmission rate Ujj 
which models how frequently information spreads from node j to 
node i; the strength of an edge We pay attention to the 
rather general case of heterogeneous pairwise transmission rates, 
i.e., infections can occur at different transmission rates over diffe- 
rent edges of the network. As aj i the expected transmission 
time from node j to node / becomes arbitrarily long. In contrast 
with the original model |9 |, we will later allow transmission rates 
0!j., to change over time. In particular, we will allow the trans- 
mission rates ay ,■ to change across cascades but not within a cas- 
cade. Allowing edge transmission rates to dynamically increase 
and decay over time will enable us to infer time- varying diffusion 
networks. 

Next, we define f{ti\tj;ajj) to be the conditional likelihood of 
transmission between node j and node / and assume it depends 
on the infection times (tjJi) and the edge transmission rate ajj. 
We allow information to only propagate forward in time, i.e., node 
j that has been infected at time tj may infect node / at time f,- 
only if tj < ti, otherwise f{ti\tj,ajj) = 0. The shape of the con- 
ditional likelihood of transmission may depend on the particular 
setting (information, influence, diseases, etc.) in which propaga- 
tion takes place. In some scenarios, it may be possible to estimate 
a non-parametric likelihood while in others, expert knowledge may 
be used to decide upon a parametric model. For simplicity, we 
consider three well-known parametric models of edge transmission 
rates: exponential, power-law, and Rayleigh, defined in Table [T] 
Exponential and power-law likelihoods have been used in modeling 
information propagation in social and information networks |9l|10[ 
|11[|12[[20) , while Rayleigh has been used in previous work in dis- 
ease spread in epidemiology |31|. In all three models, as aj i — > 0, 
the likelihood of infection tends to zero. 

We recall some additional standard notation 1 14 1 that we will use 
in the remainder of the section. Given some node j, infected at time 
tj, we define the survival function of edge j — > i as S{ti\tj;ajj) = 
1 - F{ti\tj;aj,i) where F(f,|fy; ay.,) = fl' f(t\tj;ajj)dt is the cu- 
mulative transmission density function, computed from the trans- 
mission likelihood. Finally, the hazard function, or instantaneous 
infection rate, of edge j — > / is the ratio H{ti\tj;ajj) = f{ti\tj;ajj) / 
S{ti\tj;aj,i). We derive the log survival and hazard functions for the 
three edge transmission models in Table|2] 



Likelihood of a cascade. Consider some node ; in a directed net- 
work. Node 7 can get infected by any of its parents (i.e., nodes 
pointing to i)- Once infected, node / can then also spread the con- 
tagion to its children (i.e., nodes ; points to). As in the independent 
cascade model |13) , we assume that node gets infected once the 
first parent infects it (i.e., a node can get infected only once). Then, 
the likelihood of infection of node / at time f,- given a collection of 
previously infected nodes (fi , . . . ,fiv|?/t < t,) results from summing 
over the likelihoods of the mutually disjoint events that each node 
is the first parent that generated the infection event of our node i: 

f{ti\ti,...,tN\ti;A)= f{ti\tj;ajj)x 

n S{ti\ti,-au.i), (1) 

r.jitk.tt<i, 

where A := {oLj i \ i, j = \, . . . ,n,i ^ j}. If we assume that infec- 
tions are conditionally independent given the parents of the infected 
nodes, the likelihood of the infections in a cascade is: 

/(t^^';A)=n I /(f/IO;«ii)x 

t,<Tj:tj<ti 

n S{ti\tu;auj), (2) 

where t-^ denotes the vector of infected nodes in the cascade up to 
. Removing the condition i makes the product independent 
of;, 

/(t^^';A)= n n S{ti\tk;atj)x 

t,<Tk:tt<t, 

y f{u\tj\aj.i) 

The fact that some nodes are not infected during the observation 
window is also informative. We therefore add multiplicative sur- 
vival terms to Eq.[3]and rearrange with hazard functions: 

,f(t;A)= n n S(T\ti;ai,„)x (4) 

i:ti<T m:t,„>T 

n 5(f,|f,; X (5) 

Perhaps surprisingly, our continuous time model of diffusion is 
a particular case of Aalen's additive regression model, frequently 
used in survival theory analysis |3|. In Aalen's model, the hazard 
function, or instantaneous infection rate, of node i is parametrized 
as a,',o(0 + oi{t)Jsi{t), where a{t) is a vector that accounts for the 
effect of a collection of observable covariates s(f) and is 
a baseline. It is easy to show that the hazard function of node i 
at time t, for the three pairwise transmission models: exponential, 
power-law and Rayleigh, has the following form: 

//(ri|ri,...,fjv\fi;A) = afs,(?,;fi,...,rA,\fi), (6) 



Algorithm 1 InfoPath: the dynamic network inference algorithm 
Require: C,t,K,T,p 
while k<Kdo 

Q ^ cascade-sampling(C, ?,r); 

forall(j,/) Kt-' do 

for all : ajj' > 0,tf ^ oo do 

k = k+l; 
return A*; 



where a,- = {cCij, ... , 0!a?./) accounts for the effect of a collection of 
observable covariates s,(r,;?i , . . . ,fA? \/,), the covariates depend on 
the pairwise transmission model (exponential, power-law or Ray- 
leigh) and the previously infected nodes, and the baseline is zero. 

Dynamic network inference problem. Given a static network 
with constant edge transmission rates ajj, the network inference 
problem reduces to solving a maximum likelihood problem over 
set of recorded cascades C f^ : 

maximizBA EceC 108/(1"; A) 

subject to ajj >0,i,j = l,...,N,i^ j, 

where A := {aj,i\i,j = l,...,n,i ^ j} are the edge transmission 
rates we aim to infer. The edges of the network are those pairs of 
nodes with transmission rates ajj > 0. 

Now we generalize the network inference problem to dynamic 
networks with edge transmission rates (Xj,i{t) that may change over 
time. To this aim, at any given time /, we solve a maximum weighted 
likelihood problem over the set of recorded cascades by time /, 
G = {tl,...,tlC.I}: 

maximizeA(,) ZceC,^c{t)logf{f-\{t)) 
subject to ajj{t) > 0, i,j =l,...,N,i^ j, 

where Wc{t) > is a weight that penalizes the importance of cas- 
cade c based on how old it is at time / and A(f) := {aj_,(r) | i,j = 
1 , . . . , n, ; 7^ j} are the variables. The intuition here is that diffusion 
network smoothly changes over time and that recent cascades have 
higher importance in determining current network structure than 
old cascades. Thus, at any point in time, we can solve the above 
optimization problem to obtain the structure of the diffusion net- 
work at that particular time. Next, we show how to efficiently solve 
the above optimization problem for all time points t. 

3. THE INFOPATH ALGORITHM 

The problem defined by equation Eq. [8] is convex for the three 
transmission models we consider. Therefore we can aim to find the 
unique optimal solution at any given time point t: 

Theorem 1 (|9|). Given log-concave sunival functions and 
concave hazard functions in the parameter(s) of the pairwise trans- 
mission likelihoods, the network inference problem defined by equa- 
tion Eq. |S]w convex in A. 

Stochastic gradient (SG) methods have been shown to be ex- 
tremely successful for taking advantage of the structure exhibited 
by the optimization problem stated in Eq. [S] They have received 
increasing attention in the machine learning literature (4] |5] |7] [8] 
|29 |. Although many optimization methods based on stochastic gra- 
dient descent have been proposed, we have found that in practice 



the basic projected stochastic gradient method [261 works well for 
our problem. Other more sophisticated methods, like the stochas- 
tic average gradient (29| or incremental average gradient |7 | do not 
offer a significant advantage. Therefore, we proceed with the basic 
stochastic gradient method in the remainder of the paper. 

Projected Stochastic Gradient. The projected stochastic gradient 
method |26] uses iterations of the form: 

4,(0 = (47'(0 - %V„, X,,(A^-i(/)))^ , (9) 

where Va^ Xcj(-) is the gradient of the log-likelihood Lc(-) with 
respect to the edge transmission rate aj^j, Jk is a step-size, (z)+ = 
max(0,z), and cascade q is sampled (with replacement) from Q. 
The gradients for all three edge transmission models are given in 
TableU 

Note that instead of using all historic data and then explicitly 
penalizing each cascade by a different weighting factor (f), we 
use a different, more scalable approach. We sample cascades with 
replacement, where the probability of a cascade being sampled de- 
cays with the age of the cascade. This way recent cascades get sam- 
pled more often and thus implicitly hold higher importance when 
inferring the network. In practice, we achieve a significant speed 
up by using this approach. Moreover, in our dynamic network infe- 
rence problem, the edge transmission rates usually vary smoothly. 
This means that stochastic gradient descent is a perfect method as 
we can use the inferred network from the previous time step as 
initialization for the inference procedure in the current time step. 
We find that setting the starting point a^, of each edge transmis- 
sion rate aj j to the last outputted estimate of the transmission rate 
allows us to further speed up the algorithm. 

Importantly, in each iteration k of the projected stochastic gra- 
dient method, we only need to compute the gradients V^, Xct(A*) 
for edges {j, i) such that node j has been infected in cascade Cjt, and 
the iteration cost and convergence rate are independent of |C| j5] 
23 1 . Rigorous theoretical analysis of convergence turns out to be 
a challenging problem which we leave for future work. However, 
we point out that standard analyses |26| typically assume the gra- 
dients VaLc(A*) to be either bounded above by a constant M, 
where ||Va-Lc(A)|| < M, or Lipschitz-continuous with constant L, 
||VaLc(A2) — VaZ.c(Ai )| I <L||A2 — Ai||. In our case, these con- 
ditions are violated if at any iteration k, there is a node i infected in 
cascade cj. such that //(/,'^'|/^'';aj7') = V; : fj'' < f[', i.e., node 
/ has no parents that explain the infection at fj^', and the objective 
function is positively unbounded. In practice, we obtain a good 
performance and avoid such scenario by bounding below ezch fea- 
sible transmission rate, aj j > e. An edge transmission rate aj,, is 
feasible if there is at least one cascade in which both node j and 
/ get infected. When outputting the final solution, we simply omit 
edges with transmission rates e. 

Aging edges. Suppose we solve the dynamic network inference 
problem for a given time t using the projected SG method. In each 
iteration k, we only update edge transmission rate a* , if node j has 
been infected in cascade Now, suppose a few edge transmission 
rates a?, are greater than zero for a given node j, i.e., their last 
outputted estimates before t have been positive. Then, if node j 
turns to be infected in many sampled cascades at time t but it never 
transmits information to any of its neighbors / in the future, the edge 
transmission rates aj , will eventually converge to zero for large k. 
However, if node j is never infected for any of the future cascades, 
then none of edge transmission rates aj j will be updated, and they 
will remain positive. So, if node j is never infected in subsequent 
future cascades, the transmission rates ajj will remain positive for- 
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Figure 1: True and inferred edge transmission rates for edges with different 4 transmission rate evolution patterns: (a) Slab, (b) 
Square, (c) Chainsaw, (d) Hump. Results are for the Kronecker core-periphery with exponential edge transmission model for 200 
time units with 1,000 cascades per time unit. Our InfoPath method is able to track the evolving edge transmission rates over time. 
InfoPath works better for continuously evolving edge transmission rates (c, d). 



ever. However, we would like these unused edges to decay 
and eventually vanish, or equivalently the transmission rates Uj^i to 
converge to zero. To achieve this, we multiply transmission rates of 
unused edges by aging factor p every time we solve the dynamic 
network inference problem. We use p — 0.95 in all experiments. 

Cascade sampling. In Eq. |9] instead of sampling cascades uni- 
formly at random and explicitly penalizing each cascade by a dif- 
ferent weighting factor Wc{t), we achieve a significant speed up by 
sampling cascades using a sampling procedure that penalizes old 
cascades and considers Wc{t) = 1 for all cascades. There are many 
possible sampling schemes. In practice, and for simplicity, we use 
windowed uniform sampling or windowed exponential sampling. 
Windowed means that when solving the network inference prob- 
lem for time t, we only sample (uniformly or exponentially) cas- 
cades that have started in a sampling time window {t — T,T). Here, 
we find an an important tradeoff. The shorter is the sampling time 
window T in the stochastic gradient descend, the quicker our al- 
gorithm is tracking changes in the edge transmission rates. How- 
ever, short sampling time window results in less reliable estimates 
because of sampling from a smaller universe of distinct cascades. 
Therefore, in order to be able to track changes quickly, we would 
need to observe many cascades over time. 

4. EXPERIMENTAL EVALUATION 

We evaluate the performance of InfoPath on time-varying syn- 
thetic networks that mimic the structure of real networks as well as 
on a dataset of more than 179 million information cascades ex- 
tracted from 300 million blogs and news articles from 3.3 million 
media sites over a period of one year, from March 2011 till Febru- 
ary 2012. All the data, code and additional results are available at 
the supporting website [l]. 

4.1 Experiments on synthetic data 

The goal of the experiments with synthetic data is to understand 
how temporal changes in a network affect the performance of our 
algorithm. We aim to detect not only when an edge appears (i.e., its 
transmission rate becomes > 0) or disappears {i.e., its transmission 
rate becomes 0) but also provide instantaneous transmission rate 
estimates that track the true edge transmission rates over time. 

Experimental setup. First, we generate synthetic networks us- 
ing Kronecker graph models of directed real- world networks f 17 |. 
For all our experiments, we consider two different Kronecker net- 
works, both with 1,024 nodes and 2,048 edges: A core-periphery 
Kronecker network with parameter matrix [0.9,0.5;0.5,0.3]) and a 
hierarchical Kronecker network with parameters [0.9,0. 1 ; 0. 1 , 0.9] . 

The next step is to make each edge to follow a particular edge 
transmission rate evolution pattern. Our goal later will be to recover 



the network as well as the evolution of the transmission rate of each 
individual edge. 

We consider five edge evolution patterns: Slab, Square, Chain- 
saw, Hump and constant (see Figure [T](. Slab and Hump patterns 
model outgoing connections of sites that become popular for a short 
period of time. Square and Chainsaw patterns model incoming con- 
nections to sites that perform updates periodically at specific times 
of the day or days of the week. Constant pattern represents con- 
nections between sites that interact at any time and during a long 
period of time, usually large media sites. We consider Chainsaw, 
Hump and Continuous to be examples of Type I pattern, without 
discontinuities, and Slab and Square to be examples of Type II pat- 
tern, with discontinuities. 

We assign to each edge in the network an evolution pattern cho- 
sen uniformly at random from the set of the above 5 patterns. Then, 
we generate transmission rate values 0!y , (0 for each edge accord- 
ing to its chosen evolution pattern. The evolving edge transmission 
rate CCj i(t) models how quickly information spreads from one node 
to another. Finally, we generate 1,000 information cascades per 
time step. For each cascade we randomly pick the cascade initiator 
node. 

Given the node infection times from the recorded cascades, our 
goal then is to find the true edges of the network and for each edge 
discover its transmission rate evolution pattern. In other words, in- 
ferring how each edge transmission rate a(t) evolves over time. 
Figure [T] shows the true and inferred edge transmission rates for 
four different edges, each with a different evolution pattern: Slab, 
Square, Chainsaw and Hump. Observe that InfoPath is able to 
track the evolving edge transmission rates over time for all evo- 
lution patterns. InfoPath gives near perfect performance when 
edge transmission rate evolves continuously (Chainsaw, Hump). 
Interestingly, even when the edge transmission rate evolves discon- 
tinuously (Slab, Square), InfoPath manages to track it. 

Accuracy of InfoPath. We evaluate the InfoPath method quan- 
titatively by computing four different measures for every time step: 
Precision, Recall and Accuracy of inferred edges as well as the 
Mean Squared Error (MSE) in the edge transmission rate. Preci- 
sion at time t is the fraction of edges in the inferred network G{t) 
present in the true network G*{t). Recall at time t is the fraction 
of edges of the true network G*{t) present in the inferred network 
G(t). And accuracy at time t is defined as 

^ Lij\lKj{t))-i{aijm 
Li.jlKj{t)) + l{aij{t)) ' 

where a*{t) is the true transmission rate at time t, &(t) is the esti- 
mated transmission rate at time t, and I{a{t)) = 1 if a{t) > and 
/(«(/)) = otherwise. Inferred networks with no edges or only 
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Figure 2: Precision and Recall (P-R), Accuracy and Mean 
Squared Error (MSE) of our InfoPath method against time. 
(a,c,e): Core-periphery (C-P) Kronecker network with expo- 
nential edge transmission model (b, d, f), and Hierarchical (HI) 
Kronecker network with Rayleigh edge transmission model. 
Performance on Type I (Chainsaw, Hump) and Type H (Slab, 
Square) edge transmission rate evolution patterns is plotted. 

false edges would have zero accuracy. Last, Mean Squared Error 
(MSE) at time t is defined as £[11 a* (f) - Ci!(r)|p], where a*{t) is 
the true edge transmission rate at time t and &{t) is the estimated 
transmission rate. 

Figure [2] shows Precision, Recall, Accuracy, and MSE over time 
for the time-varying core-periphery Kronecker network with expo- 
nential edge transmission model, and hierarchical Kronecker net- 
work with Rayleigh edge transmission model. Observe that the 
performance of our method is stable across time, and as mentioned 
before, continuous evolution patterns are easier to track and esti- 
mate than discontinuous ones. 

Accuracy vs. running time in static networks. Our stochastic 
gradient descend based method, InfoPath, can be also used to 
speed-up inference of static networks. In such scenario, stochastic 
gradient descend processes cascades in a random round-robin fash- 
ion. Here, we compare InfoPath to the state of the art methods 
for inference of static networks: NetInf (TO) and NetRate 0. 
First, we compare the methods by computing the accuracy against 
running time. Second, we compare InfoPath to NetRate in 
terms of mean squared error of the estimated transmission rates 
against the running time. We omit NetInf from this last compari- 
son since it only infers the network structure (and no edge transmis- 
sion rates). For the sake of fairness in the running time comparison 
we implemented all methods in C++. Our C++ implementation of 
NetRate is much faster than the public Matlab implementation. 
Figure [3] compares Accuracy and MSE against running time for 
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Figure 3: Accuracy and Mean Squared Error (MSE) against 
running time for a 1,024 node, 2,048 edge time-invariant core- 
periphery Kronecker network with power-law edge transmis- 
sion model and 5,000 cascades. Longer running times mean the 
algorithms run for more iterations. InfoPath and NetRate 
improve accuracy until convergence. However, InfoPath 
achieves the same level of performance 10-100 times faster. 

the three network inference methods for a static core-periphery 
Kronecker network. InfoPath is about 10 to 100 times faster 
than NetRate and as fast as NetInf, while achieving the same 
accuracy as NetRate. Importanfly, InfoPath and NetRate al- 
ways improve accuracy with the running time, until convergence. 
In terms of MSE, InfoPath achieves lower MSE values much 
quicker than NetRate. 

4.2 Experiments on real data 

The emergence of specific information pathways often depends 
on the information content of the news that is propagating |22[|28| . 
For example, a real world event may occur for a limited period of 
time and thus news related to the event spread quicker and to larger 
parts of the network around such time period. At any given time, 
there are many different real world events, topics, and content that 
propagates through the Web, leading to different emerging and van- 
ishing information pathways, and thus an underlying time-varying 
network. In order to better understand these temporal changes, we 
aim to reconstruct time-varying networks and the information path- 
ways for particular real world events and topics. 

Dataset description. We experiment with more than 300 million 
blog posts and news articles collected from 3.3 million websites 
over a period of one year, from March 201 1 till February 2012. We 
trace the flow of information using memes 1 16 |. Memes are a short 
textual phrases (like, "lipstick on a pig") that travel through the 
Web. We consider each meme m as a separate information cascade 
c,„. Since all documents which contain memes are time-stamped, 
a cascade c„, is simply a record of times when sites first mentioned 
meme m. We extracted more than 179 million memes, longer than 
four words. Out of these, 34 million distinct memes appeared at 
least twice, resulting in 34 million different information cascades. 

Experimental setup. Our aim is to consider sites that actively 
spread memes over the Web. We achieve this by selecting top 5,000 
sites in terms of the number of memes they mentioned. Moreover, 
we are interested in inferring dynamic networks related to particu- 
lar topics or events. So, we assume we are also given a keyword 
query Q related to the event/topic of interest. When inferring a net- 
work for a given query Q, we only consider documents (and the 
memes they mention) that include keywords Q. Then, we build 
information cascades using only those memes and apply the In- 
foPath algorithm to infer the edges and evolving edge transmis- 
sion rates. The edge transmission rates explain the propagation of 
information related to a given topic or real world event Q. For 
each query Q we infer one network per day. Table [3] summarizes 



Topic or news event (Q) 


# sites 


# memes 


Amy Winehouse 


1,207 


109,650 


Fukushima 


1,666 


383,745 


Gaddafi 


1,358 


440,646 


Kate Middleton 


1,427 


191,777 


NBA 


2,087 


1,543,630 


Occupy 


1,875 


655,183 


Strauss-Kalin 


1,263 


204,238 


Syria 


1,565 


615,176 



Table 3: Topic and news event statistics. 

the number of sites and meme cascades for several topics and real 
world events 

Implementation and scalability. We developed an efficient dis- 
tributed implementation of our InfoPath algorithm in C++ based 
on the network analysis hbrary SNAP 1 2] . We deployed the imple- 
mentation in a cluster with 1,000 CPU cores and 6 TB of RAM. 
With this setup, we considered 38 different topics/events Q. For 
each topic, we inferred a time-varying network with a daily tempo- 
ral resolution for a period of one year, from March 20 11 to February 
2012. Each network has thousands of nodes and is based on hun- 
dreds of thousands of cascades. Inferring 38 different time-varying 
networks took less than 4 hours on our cluster. Note that this is 
equivalent to solving Eq.[8]more than 13,000 times (38 x 365) for 
millions of pairwise transmission rates. We also tested our algo- 
rithm on larger datasets. For example, for "Occupy Wall Street 
movement", we were able to infer a 43,415-node time- varying net- 
work over a period of 18 months, from January 201 1 to June 2012, 
using 1,381,793 information cascades. 

Visualizing the information pathways. Figure |4] plots diffusion 
networks for three different 201 1 world events: Fukushima nuclear 
disaster, UK royal wedding, and civil uprise in Syria. Each net- 
work is shown at three different time points. Red nodes represent 
mainstream media sites, and blue nodes represent blogs (16| . 

Based on the figure, we draw several interesting observations. 
Most often, information propagates through a core-periphery net- 
work structure. Such structure emerges by few central media sites 
and blogs driving the adoption of memes across the Web |10|. 
However, the network structure often changes dramatically over 
time, and we find clusters that emerge and vanish in short peri- 
ods of time. For example, the information networks for Syria's 
uprise illustrated in Figures|4jg-h), do not have any clear clustering 
structure. However, on December 2, 2011 (Figure [4(i)] l a cluster 
suddenly emerges in the network. Further investigation reveals that 
the cluster is composed of UK news sites and blogs that discuss 
recently implemented EU sanctions against Syria. Generally, it is 
common to observe sudden formation of clusters of sites from spe- 
cific geographical areas. This is specially noticeable in the informa- 
tion network for Fukushima's disaster, in Figures|4ja-c). Such clus- 
ters often form due to language boundaries, since such boundaries 
prevent memes to flow across countries or continents. Moreover, 
we often observe that such clusters are caused by a common exter- 
nal event |22|, like in the case of UK discussion on EU sanctions 
against Syria. Inferred dynamic networks can thus be used to in- 
vestigate the flow of information as well as to detect external events 
that cause sudden perturbations to the diffusion network structure. 



Evolution of edge transmission rates. Next, we aim to study the 
evolution of links among different types of sites. We label the nodes 
in our network as mainstream media and blog, and compute the 
number of links between different types of sites over time. Figure|5] 
gives the results for several inferred diffusion networks for different 
topics and world events. We note several interesting patterns. 

The connectivity changes tend to reflect the amount of attention 
that a news event or a topic triggers over time. Unexpected news 
events, like the sex scandal of the director of the International Mon- 
etary Fund Strauss-Kahn on May 14, 201 1 in Fig. |5(g)| or the death 
of British singer Amy Winehouse on July 23, 2011 in Fig. |5(a)[ 
result in a dramatic increase in the number of edges over a short 
period of time. More general topics, like the NBA in Fig. |5(e)[ 
result in a network with more stable connectivity over time. Cer- 
tain types of news are sometimes spreading earlier among blogs 
than mainstream media. This is especially the case for popula- 
tion wide events like the Fukushima nuclear disaster, civil war in 
Libya and civil uprise in Syria (Fig.|5|b, c, h). However, it happens 
more frequently that the largest amount of links are mainstream 
media-to-mainstream media and the fewest links point from blogs 
to mainstream media. These results are intuitive and consistent 
with previous work |10| |I6| that observed most often information 
flows from mainstream media to blogs (and rarely the other way 
around). However, as we see here for population level events and 
social movements (like, in case of the civil unrest in the Middle 
East) social media plays crucial role in information dissemination 
and organization of civil movements. 

Evolution of node centrality. Having studied the dynamics of 
edges in the network we now move towards investigating the net- 
work centrality of blogs and mainstream media sites over time for 
different topics and world events. To measure network centrality 
of node S in the network at time t, we first compute shortest path 
length from 5 to any other node R in the network. Then centrality of 
node 5 is defined as Y,r ^/d{S,R), where d{S,R) is the shortest path 
length from S to R (if R is not reachable from S then d{S,R) = °°). 
For networks with core-periphery structure, nodes with high cen- 
trality are typically located in the "central" core of the network. 

Figure [6] plots the percentage of blogs among the top 100 most 
central sites over time for eight different topics/events of 20 1 1 . Per- 
haps surprisingly, we observe there is a about the same number of 
mainstream media and blogs in the top- 100 most central nodes for 
most networks - the number of blogs in the top- 100 does not typi- 
cally decreases below 30% or increases over 70%. For some topics, 
mainstream media are always more central (e.g., baseball and NBA 
in Figures |6|a, b)). In contrast, for other topics, blogs dominate 
mainstream media over a significant amounts of time (e.g., Gaddafi 
in Fig. |6(c)^ . Centrality of mainstream media and blogs can be 
relatively constant (Fig.|6ja,b)) or more time-varying (Fig.[6fc,h)). 
We find that a significant rise in the number of central blogs is of- 
ten temporally correlated with an increasing social unrest (e.g., the 
Occupy Wall Street movement in Sept-Nov 201 1 in Fig.|6(f)|). 



^Additional time-varying diffusion networks for other topics and 
news events are available at the supporting website 



Accuracy on real data. So far, we have used memes to trace the 
flow of information over the Web and have made several qualitative 
observations about the structure and dynamics of information path- 
ways in online media. We now proceed and attempt to also quan- 
titatively evaluate InfoPath on real data. In case of real data the 
ground-truth information diffusion network is impossible to obtain. 
However, we can use the temporal dynamics of hyperlinks created 
between news sites as a proxy for real information flow. Thus, by 
observing the times when sites create hyperlinks, our goal is to in- 
fer the 'targets' of the links (i.e., infer the hyperlink network from 
the hyperlinks times). 
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Figure 4: Time-varying diffusion networks for three different major events of 2011. Red nodes are mainstream media, and blue 
nodes are blogs. Additional visualizations for otlier topics and events are available at the supporting website yj. 



We proceed as follows. First, we discretize the time in days, we 
generate one network G*{t) per day t, in which we add an edge 
{u,v) if a document on a site u linked to a document on a site v 
within the last day. Then, we build a set of hyperlink cascades. 
A hyperlink cascade c/, starts when a site publishes a piece of in- 
formation and then other sites use hyper-links to refer to it. Since 
all our documents/posts are time stamped, we can trace the hyper- 
links in the reverse direction and obtain information cascades. We 
extracted almost 0.5 million hyperlink cascades from 3.3 million 
websites from July 2011 till December 2012. Our aim is to use the 
hyperlink cascades to infer the time- varying network G*{t). We 
then evaluate how many edges InfoPath estimates correctly by 
computing Accuracy, Precision and Recall for each day. 

Figure |7] shows Precision, Recall, and Accuracy over time for 
a time-varying hyperlink network with 11,461 nodes and 19,915 
edges created over time, using 495,655 hyperlink cascades from 
July 2011 to December 2011. We assume an exponential edge 
transmission model. We observe weekly periodicity and the overall 
encouraging performance of around 0.4 to 0.5 for all three perfor- 
mance metrics. 



5. CONCLUSION 

All previous network inference algorithms have assumed diffu- 
sion networks to be static. Therefore, they have considered the 
pathways over which information propagates to be static over time. 
In contrast, we developed an algorithm for time-varying network 
inference, InfoPath. Our algorithm provides on-line time-varying 
estimates of the edges of the network as well as the dynamic edge 
transmission rates, which allows us to detect how information path- 
ways emerge and vanish over time. 

We evaluated our algorithm on synthetic data and demonstrated 
that InfoPath successfully tracks changes in the topology of dy- 
namic networks, provides accurate on-line estimates of the time- 
varying edge transmission rates and is also robust across network 
topologies, edge transmission models and patterns of evolution of 
edge transmission rates. 

We also run InfoPath on real data and investigated how real 
networks and information pathways evolve over time. We found 
that information pathways over which general recurrent topics prop- 
agate remain relatively stable across time. In contrast, major real- 
world events lead to dramatic changes and shifts in the informa- 
tion pathways. We observed that clusters of mainstream news and 
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Figure 5: The number of links that point between different types of sites over time for eight different inferred diffusion networks. 
We split the sites into mainstream media and blogs and count the links among these two node types. 
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Figure 7: Precision, Recall, and Accuracy of InfoPath for a 
time-varying hyperlink network with 11,461 nodes and 19,915 
edges over time, using 495,655 hyperlink cascades from July 
2011 to December 2011. 

blogs often emergence and vanish in matter of days. We discovered 
that there is an early greater increase in information transfer among 
blogs than among mainstream media for news involving general 
population and social unrest, such as the Libyan civil war, Egyptian 
revolution, Syria's uprise and the Occupy Wall Street movement. 

Our work also opens various venues for future work. For exam- 
ple, rigorous theoretical analysis of the convergence of our stochas- 
tic gradient descent method would provide further insights for its 
performance. Moreover, we notice that many times the changes in 
the inferred network structure could be attributed to sudden exter- 
nal real-world events. This opens two interesting questions. How 
can diffusion network inference be combined with methods for de- 
tecting external influence in networks |22|? And also, how can 
dynamic network inference be extended for detecting unexpected 
real-world events based on a stream of documents? Last, many 
times not only information but also sentiment attached to a piece 
of information spreads through the network 1 19 |. It would be inter- 
esting to think about inference of signed networks, where a posi- 
tive/negative valence of an edge models sentiment relationship be- 
tween a pair of nodes. Overall, such methods would allow us to im- 
prove our understanding of the cuiTent landscape of news coverage, 
the role that news media plays in framing the discussion of impor- 
tant topics, and the evolving ecosystem that news media occupies. 
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